163 research outputs found

    Numerical Implementation of lepton-nucleus interactions and its effect on neutrino oscillation analysis

    Full text link
    We discuss the implementation of the nuclear model based on realistic nuclear spectral functions in the GENIE neutrino interaction generator. Besides improving on the Fermi gas description of the nuclear ground state, our scheme involves a new prescription for Q2Q^2 selection, meant to efficiently enforce energy momentum conservation. The results of our simulations, validated through comparison to electron scattering data, have been obtained for a variety of target nuclei, ranging from carbon to argon, and cover the kinematical region in which quasi elastic scattering is the dominant reaction mechanism. We also analyse the influence of the adopted nuclear model on the determination of neutrino oscillation parameters.Comment: 19 pages, 35 figures, version accepted by Phys. Rev.

    Automated data pre-processing via meta-learning

    Get PDF
    The final publication is available at link.springer.comA data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed. We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning. Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Peer ReviewedPostprint (published version

    Conditional Neural Relational Inference for Interacting Systems

    Full text link
    In this work, we want to learn to model the dynamics of similar yet distinct groups of interacting objects. These groups follow some common physical laws that exhibit specificities that are captured through some vectorial description. We develop a model that allows us to do conditional generation from any such group given its vectorial description. Unlike previous work on learning dynamical systems that can only do trajectory completion and require a part of the trajectory dynamics to be provided as input in generation time, we do generation using only the conditioning vector with no access to generation time's trajectories. We evaluate our model in the setting of modeling human gait and, in particular pathological human gait

    Determining appropriate approaches for using data in feature selection

    Get PDF
    Feature selection is increasingly important in data analysis and machine learning in big data era. However, how to use the data in feature selection, i.e. using either ALL or PART of a dataset, has become a serious and tricky issue. Whilst the conventional practice of using all the data in feature selection may lead to selection bias, using part of the data may, on the other hand, lead to underestimating the relevant features under some conditions. This paper investigates these two strategies systematically in terms of reliability and effectiveness, and then determines their suitability for datasets with different characteristics. The reliability is measured by the Average Tanimoto Index and the Inter-method Average Tanimoto Index, and the effectiveness is measured by the mean generalisation accuracy of classification. The computational experiments are carried out on ten real-world benchmark datasets and fourteen synthetic datasets. The synthetic datasets are generated with a pre-set number of relevant features and varied numbers of irrelevant features and instances, and added with different levels of noise. The results indicate that the PART approach is more effective in reducing the bias when the size of a dataset is small but starts to lose its advantage as the dataset size increases

    Adjusted Measures for Feature Selection Stability for Data Sets with Similar Features

    Full text link
    For data sets with similar features, for example highly correlated features, most existing stability measures behave in an undesired way: They consider features that are almost identical but have different identifiers as different features. Existing adjusted stability measures, that is, stability measures that take into account the similarities between features, have major theoretical drawbacks. We introduce new adjusted stability measures that overcome these drawbacks. We compare them to each other and to existing stability measures based on both artificial and real sets of selected features. Based on the results, we suggest using one new stability measure that considers highly similar features as exchangeable

    Stable and Accurate Feature Selection

    Full text link

    Addressing the Challenge of Defining Valid Proteomic Biomarkers and Classifiers

    Get PDF
    Background: The purpose of this manuscript is to provide, based on an extensive analysis of a proteomic data set, suggestions for proper statistical analysis for the discovery of sets of clinically relevant biomarkers. As tractable example we define the measurable proteomic differences between apparently healthy adult males and females. We choose urine as body-fluid of interest and CE-MS, a thoroughly validated platform technology, allowing for routine analysis of a large number of samples. The second urine of the morning was collected from apparently healthy male and female volunteers (aged 21-40) in the course of the routine medical check-up before recruitment at the Hannover Medical School.Results: We found that the Wilcoxon-test is best suited for the definition of potential biomarkers. Adjustment for multiple testing is necessary. Sample size estimation can be performed based on a small number of observations via resampling from pilot data. Machine learning algorithms appear ideally suited to generate classifiers. Assessment of any results in an independent test set is essential.Conclusions: Valid proteomic biomarkers for diagnosis and prognosis only can be defined by applying proper statistical data mining procedures. In particular, a justification of the sample size should be part of the study design

    Algebraic Comparison of Partial Lists in Bioinformatics

    Get PDF
    The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or just within a meta-analysis comparison, instead of one list it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained. Here we introduce a method, based on the algebraic theory of symmetric groups, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated first on synthetic data in a gene filtering task and then for finding gene profiles on a recent prostate cancer dataset

    Measurement of cosmic-ray reconstruction efficiencies in the MicroBooNE LArTPC using a small external cosmic-ray counter

    Full text link
    The MicroBooNE detector is a liquid argon time projection chamber at Fermilab designed to study short-baseline neutrino oscillations and neutrino-argon interaction cross-section. Due to its location near the surface, a good understanding of cosmic muons as a source of backgrounds is of fundamental importance for the experiment. We present a method of using an external 0.5 m (L) x 0.5 m (W) muon counter stack, installed above the main detector, to determine the cosmic-ray reconstruction efficiency in MicroBooNE. Data are acquired with this external muon counter stack placed in three different positions, corresponding to cosmic rays intersecting different parts of the detector. The data reconstruction efficiency of tracks in the detector is found to be ϵdata=(97.1±0.1 (stat)±1.4 (sys))%\epsilon_{\mathrm{data}}=(97.1\pm0.1~(\mathrm{stat}) \pm 1.4~(\mathrm{sys}))\%, in good agreement with the Monte Carlo reconstruction efficiency ϵMC=(97.4±0.1)%\epsilon_{\mathrm{MC}} = (97.4\pm0.1)\%. This analysis represents a small-scale demonstration of the method that can be used with future data coming from a recently installed cosmic-ray tagger system, which will be able to tag 80%\approx80\% of the cosmic rays passing through the MicroBooNE detector.Comment: 19 pages, 12 figure
    corecore